Maftools: Efficient analysis, visualization and summarization of MAF files from large-scale cohort based cancer studies

نویسندگان

  • Anand Mayakonda
  • H Phillip Koeffler
چکیده

Mutation Annotation Format (MAF) has become a standard file format for storing somatic/germline variants derived from sequencing of large cohort of cancer samples. MAF files contain a list of all variants detected in a sample along with various annotations associated with the putative variant. MAF file forms the basis for many downstream analyses and provides complete landscape of the cohort. Here we introduce maftools an R package that provides rich source of functions for performing various analyses, visualizations and summarization of MAF files. Maftools uses data.table library for faster processing/summarization and ggplot2 for generating rich and publication quality visualizations. Maftools also takes advantages of S4 class system for better data representation, with easy to use and flexible functions. Availability and Implementation: maftools is implemented as an R package available at https://github.com/PoisonAlien/maftools Contact: [email protected] Introduction: With advances in cancer genomics and reduction in cost per base of sequencing technologies, sequencing large cohort of cancer patients has become an efficient way of determining genetic abnormalities associated with the disease [1-4]. Such cohort-based studies often results in large amount of data in the form of somatic/germline variants containing single nucleotide variants (SNP) and small insertion/deletions (INDELS). This data is generally stored in the form of Mutation Annotation Format and provides a complete genomic landscape of the cohort [5]. The Cancer Genome Atlas (TCGA) project has sequenced over 30 different types of cancer and resulting somatic variants are stored as MAF files, with several independent studies following the same [6]. MAF files provides baseline data for many downstream analyses such as driver gene detection, detecting mutually exclusive set of events, mutational signatures and tumor heterogeneity estimation [7-10]. Visualization also plays key role in genomic studies, with researchers often struggling to generate publication quality images, such as oncoplots (also known as waterfall plots), lollipop plots and oncoprints to name a few. As MAF files are getting standardized, current bioinformatic community lacks software to process them. Here, we developed maftools to process, summarize and analyze MAF files, resulting from large cohort based studies. Maftools provides various plotting functions to visualize data stored in MAF files to help researchers generate publication quality images. Functions are also implemented to perform some of the common analyses in cancer studies, including disease associated driver gene detection, mutual exclusivity analysis, and tumor heterogeneity estimation. Along with analysis of MAF files, maftools also provides functions to integrate and visualize of copy number data. Usage of maftools is straightforward with self-explanatory functions and is implemented as an open source R package. . CC-BY-ND 4.0 International license not peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was . http://dx.doi.org/10.1101/052662 doi: bioRxiv preprint first posted online May. 11, 2016;

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Association between Anthropometric Indices and Breast Cancer Based on the Data of the Enrolment Phase (cross-sectional) in Tabari Cohort Study: The Causal Relationship or Violation of Temporality

Background and purpose: The relationship between obesity and the risk of breast cancer has been highlighted in some studies. This research aimed at studying the association between anthropometric indices and breast cancer based on enrolment phase (cross-sectional phase) data in Tabari cohort study. Materials and methods: In this cohort, 51 cases of breast cancer were recorded which were consid...

متن کامل

مرور مؤثر نتایج جستجوی تصاویر با تلخیص بصری و متنوع از طریق خوشه‌بندی

With unprecedented growth in production of digital images and use of multimedia references, requirement of image and subject search has been increased. Systematic processing of this information is a basic prerequisite for effective analysis, organization and management of it. Likewise, large collections of images have been made available on the Web and many search engines have provided the poss...

متن کامل

Text Summarization Using Cuckoo Search Optimization Algorithm

Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...

متن کامل

Subtypes of Benign Breast Disease as a Risk Factor for Breast Cancer: A Systematic Review and Meta-Analysis Protocol

AbstractBreast cancer is a multifactorial disease. Benign breast disease (BBD) is one of the most important risk factors for breast cancer. The etiology of BBD is unknown. It is divided into nonproliferative and proliferative diseases. The selection of studies will be based on titles, abstract screening, inclusion and exclusion criteria, and quality assessment. Previous studies have shown that ...

متن کامل

Design and Test of the Real-time Text mining dashboard for Twitter

One of today's major research trends in the field of information systems is the discovery of implicit knowledge hidden in dataset that is currently being produced at high speed, large volumes and with a wide variety of formats. Data with such features is called big data. Extracting, processing, and visualizing the huge amount of data, today has become one of the concerns of data science scholar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016